A Case for Abstract Cost Models for Distributed Execution of Analytics Operators

نویسندگان

  • Rundong Li
  • Ningfang Mi
  • Mirek Riedewald
  • Yizhou Sun
  • Yi Yao
چکیده

We consider data analytics workloads on distributed architectures, in particular clusters of commodity machines. To find a job partitioning that minimizes running time, a cost model, which we more accurately refer to as makespan model, is needed. In attempting to find the simplest possible, but sufficiently accurate, such model, we explore piecewise linear functions of input, output, and computational complexity. They are abstract in the sense that they capture fundamental algorithm properties, but do not require explicit modeling of system and implementation details such as the number of disk accesses. We show how the simplified functional structure can be exploited by directly integrating the model into the makespan optimization process, reducing complexity by orders of magnitude. Experimental results provide evidence of good prediction quality and successful makespan optimization across a variety of cluster architectures.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

NScale: Neighborhood-centric Analytics on Large Graphs

There is an increasing interest in executing rich and complex analysis tasks over large-scale graphs, many of which require processing and reasoning about a large number of multi-hop neighborhoods or subgraphs in the graph. Examples of such tasks include ego network analysis, motif counting in biological networks, finding social circles, personalized recommendations, link prediction, anomaly de...

متن کامل

Building Efficient and Cost-Effective Cloud-based Big Data Management Systems

Title of dissertation: BUILDING EFFICIENT AND COST-EFFECTIVE CLOUD-BASED BIG DATA MANAGEMENT SYSTEMS Abdul Hussain Quamar, Doctor of Philosophy, 2015 Dissertation directed by: Professor Amol Deshpande Department of Computer Science In today’s big data world, data is being produced in massive volumes, at great velocity and from a variety of different sources such as mobile devices, sensors, a pl...

متن کامل

Runtime Prediction for Scale-Out Data Analytics

Many analytics applications generate mixed workloads, i.e., workloads comprised of analytical tasks with different processing characteristics including data pre-processing, SQL, and iterative machine learning algorithms. Examples of such mixed workloads can be found in web data analysis, social media analysis, and graph analytics, where they are executed repetitively on large input datasets (e....

متن کامل

Afterburner: The Case for In-Browser Analytics

This paper explores the novel and unconventional idea of implementing an analytical RDBMS in pure JavaScript so that it runs completely inside a browser with no external dependencies. Our prototype, called Afterburner, generates compiled query plans that exploit typed arrays and asm.js, two relatively recent advances in JavaScript. On a few simple queries, we show that Afterburner achieves comp...

متن کامل

A New Bi-Objective Model for a Multi-Mode Resource-Constrained Project Scheduling Problem with Discounted Cash Flows and four Payment Models

The aim of a multi-mode resource-constrained project scheduling problem (MRCPSP) is to assign resource(s) with the restricted capacity to an execution mode of activities by considering relationship constraints, to achieve pre-determined objective(s). These goals vary with managers or decision makers of any organization who should determine suitable objective(s) considering organization strategi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017